High-Performance Library Software for QR Factorization
نویسندگان
چکیده
In 5, 6], we presented algorithm RGEQR3, a purely recur-sive formulation of the QR factorization. Using recursion leads us to a natural way to choose the k-way aggregating Householder transform of Schreiber and Van Loan 10]. RGEQR3 is a performance critical sub-routine for the main (hybrid recursive) routine RGEQRF for QR fac-torization of a general m n matrix. This contribution presents a new version of RGEQRF and its accompanying SMP parallel counterpart, implemented for a future release of the IBM ESSL library. It represents a robust high-performance piece of library software for QR factorization on uniprocessor and multiprocessor systems. The implementation builds on previous results 5, 6]. In particular, the new version is optimized in a number of ways to improve the performance; e.g., for small matrices and matrices with a very small number of columns. This is partly done by including mini blocking in the otherwise pure recursive RGEQR3. We describe the salient features of this implementation. Our serial implementation outperforms the corresponding LAPACK routine by 10-65% for square matrices and 10-100% on tall and thin matrices on the IBM POWER2 and POWER3 nodes. The tests covered matrix sizes which varied from very small to very large. The SMP parallel implementation shows close to perfect speedup on a 4-processor PPC604e node.
منابع مشابه
Multifrontral multithreaded rank-revealing sparse QR factorization
SuiteSparseQR is a sparse multifrontal QR factorization algorithm. Dense matrix methods within each frontal matrix enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading Building Blocks library. Rank-detection is performed within each frontal matrix using Heath’s method, which does not require colu...
متن کاملSCALABILITY ISSUES AFFECTING THE DESIGN OFA DENSE LINEAR ALGEBRA LIBRARYJack
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distributed memory concurrent computers. These routines form part of the ScaLAPACK mathematical software library that extends the widely-used LAPACK library to run eeciently on scalable concurrent computers. To ensure good scalability and performance, the ScaLAPACK routines are based on block-partitioned...
متن کاملTall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on ...
متن کاملEnhancing Parallelism of Tile QR Factorization for Multicore Architectures
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist of scheduling a Directed Acyclic Graph (DAG) of fine granularity tasks where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on mod...
متن کاملFully Empirical Autotuned QR Factorization For Multicore Architectures
Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures. We show that it is hard to rely on a model, which motivates us to design a fully empirical approac...
متن کامل